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Abstract. We present a low-effort program transformation to improve 
the efficiency of computations over free monads in Haskell. The develop- 
ment is calculational and carried out in a generic setting, thus applying 
to a variety of datatypes. An important aspect of our approach is the 
utilisation of type class mechanisms to make the transformation as trans- 
parent as possible, requiring no restructuring of code at all. There is also 
no extra support necessary from the compiler (apart from an up-to-date 
type checker). Despite this simplicity of use, our technique is able to 
achieve true asymptotic runtime improvements. We demonstrate this by 
examples for which the complexity is reduced from quadratic to linear. 

1 Introduction 

Monads [1] have become everyday structures for Haskell programmers to work 
with. Not only do monads allow to safely encapsulate impure features of the 
programming language [2,3], but they are also used in pure code to separate 
concerns and provide modular design [4,5]. But, as usual in software construc- 
tion, modularity comes at a cost, typically with respect to program efficiency. 
We propose a method to improve the efficiency of code over a large variety of 
monads. A distinctive feature is that this method is non-intrusive: it preserves 
the appearance of code, with the obvious software engineering benefits. 

Since our approach is best introduced by considering a concrete example, 
illustrating both the problem we address and our key ideas, that is exactly what 
we do in the next section. Thereafter, Sect. 3 develops the approach formally, 
embracing a generic programming style. Further example material is provided in 
Sects. 4 and 5, where the latter emphasises comparison to related work, before 
Sect. 6 concludes. 

The code that we present throughout requires some extensions over the 
Haskell 98 standard, in particular rank-2 polymorphism and multi-parameter 
type constructor classes. It was tested against both GHC (version 6.6, flag 
-f glasgow-exts) and Hugs (version of March 2005, flag -98), and is available 
online at http : //wwwtcs . inf . tu-dresden. de/^voigt/Improve . lhs. 
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2 A Specific Example 



We first study a simple and somewhat artificial example of the kind of trans- 
formation we want to achieve. This prepares the ground for the more generic 
development in the next section, and more practical examples later on. 
Consider the following datatype of binary, leaf-labelled trees: 

data Tree a = Leaf a | Node (Tree a) (Tree a) 

An important operation on such trees is substituting leaves by trees depending 
on their labels, defined by structural induction as follows: 

subst :: Tree a — > (a — ► TREE (3) — » TREE f3 
subst (Leaf a) k = k a 

subst (Node t 1 f 2 ) k = Node (subst t 1 k) (subst t 2 k) 

Note that the type of labels might change during such a substitution. 
It is well-known that trees with substitution form a monad. That is, 

instance Monad Tree where 
return = Leaf 
(»=) = subst 

defines an instance of the following type constructor class: 

class Monad fi where 
return :: a — > fx a 

(»=) :: fi a — > (a — > /i j3) — > fi (3 

where the following three laws hold: 

(return a »= k) = k a (1) 
(m »= return) = m (2) 

((m »= k) »= h) = (m »= (Aa ->• /( a »= h)) (3) 

An example use of the monad instance given above is the following program 
generating trees like those in Fig. 1: 

fullTree :: Int — > Tree Int 
fullTree 1 = Leaf 1 
fullTree (n+1) = 
do 

;' <— fullTree n 

Node (Leaf (n-/)) (Leaf (/+1)) 
Note that the second equation is equivalent to 
fullTree (n+1) = fullTree n »= \i -> Node (Leaf (n— /)) (Leaf (;+1 )) 
and thus to 

fullTree (n+1) = subst (fullTree n) (A; -> Node (Leaf (n-i)) (Leaf (/+1))) 
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Fig. 1. fullTree 1 , fullTree 2, fullTree 3, fullTree 4 



This means that to create, for example, the tree fullTree 4, the following expres- 
sion is eventually evaluated: 

subst (subst (subst (Leaf 1 )...)...).. . (4) 

The nested calls to subst mean that prefix fragments of the overall tree structure 
will be traversed again and again. 

In general, the asymptotic time complexity of computing fullTree n is of the 
order 2", which is no surprise as the size of the finally computed output is of 
that order as well. But a more interesting effect occurs when that output is only 
partially demanded, such as by the following function: 

zigzag :: Tree Int — > Int 
zigzag = zig 
where 

zig (Leaf n) = n 

zig (Node t t f 2 ) = zag t x 

zag (Leaf n) = n 

zag (Node t ± f 2 ) = zig f 2 

Now the expression 

zigzag (fullTree n) (5) 

has quadratic runtime in n despite exploring only a single tree path of length 
linear in n. To see where the quadratic runtime comes from, consider the fol- 
lowing partial reduction sequence for zigzag (fullTree 4). starting from (4) and 
having reordered a bit the lazy evaluation steps by first, and only, recording 
those involving subst: 

zigzag (subst (subst (subst (Leaf 1 )•••)•••)•• •) 
=> zigzag (subst (subst (Node (Leaf 0) (Leaf 2)) • • •) • • •) 
=> 2 zigzag (subst (Node (Node (Leaf 2) (Leaf 1 )) (subst (Leaf 2) •••))• • •) 
=> 3 zigzag (Node (Node (subst (Leaf 2) • • •) (Node (Leaf 2) (Leaf 2))) • • •) 
=>*••• 

The challenge is to bring the overall runtime for (5) down to linear, but to do 
so without changing the structure of the code for fullTree. 

The situation here is similar to that in the well-known definition of naive list 
reversal, where left-associatively nested appends cause quadratic runtime. And 
in fact there is a similar cure. We can create an alternative representation of 
trees somewhat akin to the "novel representation of lists" of Hughes [6], also 



known as difference lists. Just as the latter abstract over the end of a list, we 
abstract over the leaves of a tree as follows: 

newtype CTree a = CTree (V/3. (a — > Tree (3) — > Tree j3) 

The connection between ordinary trees and their alternative representation is 
established by the following two functions: 

rep :: TREE a — » CTREE a 
rep f = CTree (subst f) 

abs :: CTREE a —> TREE a 
abs (CTree p) = p Leaf 

We easily have abs o rep = id. Moreover, the representation type forms itself a 
monad as follows: 

instance Monad CTree where 
return a = CTree (Xh — > h a) 

CTree p »= k = CTree (Xh — > p (Aa — > case /c a of CTree q — > q h)) 

But to use it as a drop-in replacement for TREE in the definition of fullTree, the 
type constructor CTREE need not only support the monad operations, but also 
the actual construction of (representations of) non-leaf trees. To capture this 
requirement, we introduce the following type constructor class: 

class Monad /i => Treelike [i where 
node :: \i a — > fi a — > /i a 

For ordinary trees, the instance definition is trivial: 

instance Treelike Tree where 
node = Node 

For the alternative representation, we have to take care to propagate the abstracted- 
over leaf replacement function appropriately. This is achieved as follows: 

instance Treelike CTree where 

node (CTree p 1 ) (CTree p 2 ) = CTree (Xh — > Node (p 1 h) (p 2 h)) 

For convenience, we also define an abstract version of the Leaf constructor. 

leaf :: TREELIKE /i => a — > fx a 
leaf = return 

Now, we can easily give a variant of fullTree that is independent of the choice of 
trees to work with. 

fullTree' :: TreeLike /i => Int — > \i Int 
fullTree' 1 = leaf 1 
fullTree' (n+1 ) = 
do 

/ <— fullTree' n 

node (leaf (n-i)) (leaf (/+1)) 



Note that the code structure is exactly as before. Moreover, 



zigzag (fullTree' n) 



(6) 



still needs quadratic runtime. Indeed, GHC 6.6 with optimisation settings pro- 
duces exactly the same compiled code for (5) and (6). Nothing magical has 
happened yet: any overhead related to the type class abstraction in (6) is simply 
optimised away. So there appears to be neither a gain from, nor a penalty for 
switching from fullTree to fullTree'. Why the (however small) effort, then? 

The point is that we can now switch to an asymptotically more efficient 
version with almost zero effort. It is as simple as writing 



where all the "magic" lies with the following function: 

improve :: (V/i. TREELIKE /i =>■ fj, a) — > TREE a 
improve m = abs m 

In contrast to (5) and (6), evaluation of (7) has runtime only linear in n. 

The rationale for the type of improve, as well as the correctness of the above 
transformation in the sense that (7) always computes the same output as (6), 
will be discussed in the next section, all for a more general setting than the 
specific type of trees and the example considered here. 

We end the current section by pointing out that (7) is compiled (again by 
GHC 6.6) to code corresponding to 



where: 

fullTree" :: I nt -> (Int -> Tree /?) -> Tree [3 
fullTree" 1 h = h 1 

fullTree" (n+1) h = fullTree" n (A; -> Node (h (n-i)) (h (/+1))) 

This should make apparent why the runtime is now only of linear complexity. 

3 The Generic Setting 

To deal with a variety of different datatypes in one stroke, we use the by now 
folklore approach of two- level types [7,8]. 

A functor is an instance of the following type constructor class: 

class Functor 0 where 

fmap :: (a — > (3) — > <j> a — > <p [3 

satisfying the following two laws: 



zigzag (improve (fullTree' n)) , 



(7) 



zigzag (fullTree" n Leaf) , 



fmap id t = t 
fmap f (fmap g t) = fmap (f o g) t 



(8) 
(9) 



Given such an instance, the corresponding free monad (capturing terms contain- 
ing variables, along with a substitution operation) is defined as follows: 

data Free 4> a = Return a | Wrap (</> (Free 4> ")) 

instance Functor 4> Monad (Free 4>) where 
return = Return 
Return a »= k = k a 
Wrap f »= k = Wrap (fmap (»= k) t) 

Of course, we want to be sure that the laws (l)-(3) hold for the instance just 
defined. While law (1) is obvious from the definitions, the other two require 
fixpoint induction and laws (8) and (9). 

As an example, consider the following functor: 

data F 0 = N 0 0 

instance Functor F where 

fmap h (N x y) = N (h x) (h y) 

Then FREE F corresponds to TREE from Sect. 2, and the monad instances agree. 

Back to the generic setting. What was abstraction over leaves in the previous 
section, now becomes abstraction over the return method of a monad. This 
abstraction is actually possible for arbitrary, rather than only for free monads. 
The straight-forward definitions are as follows: 

newtype C/j« = C (V/3. (a -> ^ 0) -> \i 0) 

rep :: MONAD /i=^a^C^a 
rep m = C (m »=) 

abs :: MONAD fi^Cfia^fj,a 
abs (C p) = p return 

instance Monad (C /j,) where 

return a = C (Xh — > h a) 

C p »= k = C (Xh — » p (Aa — ► case k a of C q — > q h)) 

Even though the monad laws do hold for the latter instance, we will not need 
this fact later on. What we will need, however, is the abs o rep = id property: 

abs (rep m) 

= by definition of rep 
abs (C [m »=)) 

= by definition of abs (10) 
m »= return 

= by law (2) for the instance MONAD /i 
m 



We also need to establish connections between the methods of the instances 
Monad \i and Monad (C /i). For return, we have: 



rep (return a) 

= by definition of rep 
C (return a »=) 

= by definition of sectioning 
C (Xh -> return a »= h) (11) 

= by law (1) for the instance MONAD fj, 
C (Xh -> h a) 

= by definition of return for the instance MONAD (C fi) 
return a 

Note that the occurrences of return in the first few lines refer to the instance 
MONAD /i, whereas the return in the last line lives in the instance MONAD (C /i). 
For the other method of the MONAD class, we get the following distribution-like 
property: 

rep (m »= k) 

= by definition of rep 
C ((m »= k) »=) 

= by definition of sectioning 
C (Xh -> (m »= k) »= h) 

= by law (3) for the instance MONAD fi 
C (Xh -> m »= (Aa -> /t a »= /?)) 

= by case-of-known (12) 
C (Ah — > m »= (Aa -» case C (k a »=) of C q — > q h)) 

= by definition of rep 
C (Xh — > m »= (Aa — > case rep (A a) of C q — > q h)) 

= by definition of »= for the instance MONAD (C fi) 
C (m »=) »= (rep o k) 

= by definition of rep 
rep m »= (rep o /c) 

Next, we need support for expressing the construction of non-return values 
in both monads FREE <j> and C (FREE 4>). To this end, we introduce the following 
multi-parameter type constructor class: 

class (Functor 4>, Monad fi) FreeLike 4> \i where 
wrap :: 4> (/•* a ) — * A* a 

As in Sect. 2, one instance definition is trivial: 

instance Functor 4> FreeLike 0 (Free 0) where 
wrap = Wrap 

The other one takes a bit more thinking, but will ultimately be justified by the 
succeeding calculations. 



instance FreeLike 4> \i => FreeLike 0 (C ll) where 
wrap t = C (Aft — ► wrap (fmap (A(C p) -> p /i) f)) 

Similarly as for the monad methods before, we would like to prove distribution of 
rep over wrap, thus establishing a connection between instances FreeLike cf> fi and 
FreeLike </> (C fx). More specifically, we expect rep (wrap f) = wrap (fmap rep f). 
However, a straightforward calculation from both sides gets stuck somewhere in 
the middle as follows: 

rep (wrap f) 

= by definition of rep 
C (wrap f »=) 





= by 


definition of sectioning 


c 


(Aft - 


-> wrap f »= h) 




= by 


111 


c 


{Xh - 


-> wrap (fmap (»= ft) f)) 




= by 


definition of sectioning 


c 


(Xh - 


-> wrap (fmap (Am — > m »= ft) f)) 




= by 


case-of-known 


c 


(Xh - 


-> wrap (fmap (Am -> (A(C p) -> p ft) (C (m »=))) f)) 




= by 


definition of rep 


c 


(Xh - 


-> wrap (fmap (Am — ► (A(C p) — > p ft) (rep m)) f)) 




= by 


law (9) for the instance FUNCTOR <fi 


c 


(Xh - 


-> wrap (fmap (A(C p) — > p ft) (fmap rep f))) 




= by 


definition of wrap for the instance FreeLike (f> (C fx) 



wrap (fmap rep f) 

On reflection, this is not so surprising, since it was to be expected that at some 
point we really need to consider the more specific FREE </> versus C (FREE <f>) 
rather than the more general (and thus less informative) fi versus C fi as done 
for (10)-(12). Here now this point has come, and indeed we can reason for f of 
type 4> (Free <f> a) as follows: 

rep (wrap f) 

= as above 
C (Aft — » wrap t »= ft) 

= by definition of wrap for the instance FreeLike <fi (FREE (/>) 
C (Aft -> Wrap t »= ft) 

= by definition of X>= for the instance MONAD (FREE 0) (13) 
C (Aft -> Wrap (fmap (»= ft) f)) 

= by definition of wrap for the instance FreeLike <$> (FREE <j>) 
C (Aft — > wrap (fmap (»= ft) f)) 

= as above 
wrap (fmap rep f) 



Our "magic function" is again the same as abs up to typing: 

improve :: FUNCTOR (f> =>- (V/i. FreeLike <p // => fx a) — > FREE 0 a 
improve m = abs m 

In fact, comparing their types should be instructive. Recall that 

abs :: MONAD /i=^C/ia:— >/ia. 

This type is different from that of improve in two ways. The first, and less essen- 
tial, one is that abs is typed with respect to an arbitrary monad /i, whereas ulti- 
mately we want to consider the more specific case of monads of the form FREE <j>. 
Of course, by simple specialisation, abs admits the following type as well: 

abs :: FUNCTOR 4> => C (FREE 0) a — > FREE 4> a . 

But, more essentially, the input type of improve is V/i. FreeLike <f> fj, \i a, 
which puts stronger requirements on the argument m than just C (FREE <fi) a 
would do. And that is what finally enables us to establish the correctness of 
adding improve at will wherever the type checker allows doing so. The reasoning, 
in brief, is as follows: 

improve m 

— by definition of improve 
abs m 

= by (11)~(13) 
abs (rep m) 

= by (10) 
m 

To understand in more detail what is going on here, it is particularly helpful to 
examine the type changes that m undergoes in the above calculation. 

1. In the first line, m has the type V/j. FreeLike <j) [i /j, a (for some fixed 
instance FUNCTOR <j> and some fixed a), because that is what the type of 
improve forces it to be. 

2. In the second line, m has the type C (FREE <j>) a, because that is what the 
type of abs forces it to be, taking into account that the overall expres- 
sion in each line must have the type FREE (f> a. When going from left to 
right in the definition of improve, the type of m is thus specialised from 
V/i. FreeLike <f> n \x a to C (Free <f>) a. This is possible, and done silently 
(by the type checker), since an instance FreeLike 4> (C (FREE (f>)) follows 
from the existing instance declarations FUNCTOR <fi => FreeLike <f> (FREE cf>) 
and FreeLike <f> /i => FreeLike <f) (C /i). 

3. In the third line, m has the type FREE (j> a, because that is what the types 
of abs and rep force it to be. That type is an alternative specialisation of the 
original type V/i. FreeLike <f) \i /i a of m, possible due to the instance dec- 
laration Functor 4> FreeLike 4> (Free 0). The key observation about the 
second versus third lines is that even though m has been type-specialised in 
two different ways, the definition (or value) of m is still the same as in the first 



line. And since there it has the very general type V/i. FreeLike <f> fi =>■ fx a, 
we know that m cannot be built from any ^-related operations for any spe- 
cific (j,. Rather, its /i-structure must be made up from the overloaded oper- 
ations return, »=, and wrap only And since rep distributes over all of these 
by (11)— (13), we have 

rep (m :: Free <j> a) = (m :: C (Free 4>) a) 

A more formal proof would require techniques akin to those used for deriving 
so-called free theorems [9,10]. 
4. In the fourth line, m still has the type FREE <f> a. 

The essence of all the above is that improve m can be used wherever a value 
of type FREE (j> a is expected, but that m itself must (also) have the more gen- 
eral type V/i. FreeLike <j) \i =^ /i a, and that then improve m is equivalent to 
just m. appropriately type-specialised. Or, put differently, wherever we have a 
value of type FREE <f> a which is constructed in a sufficiently abstract way (via 
the overloaded operators return, »=, and wrap) that it could also be given 
the type V/i. FreeLike <f> \x => \i a, we can apply improve to that value without 
changing program semantics. Yet another perspective is that improve is simply 
a type conversion function that can replace an otherwise anyway, but silently, 
performed type specialisation and has the nice side effect of potentially improv- 
ing the asymptotic runtime of a program (when left-associatively arranged calls 
to »= cause quadratic overhead). 

Having studied the generic setting, we can once more return to the specific 
example from Sect. 2. As already mentioned, the functor F given earlier in the 
current section yields FREE F corresponding to TREE. Moreover, in light of the 
further general definitions from the current section, F and its functor instance 
definition also give us all the remaining ingredients of our improvement approach 
for binary, leaf-labelled trees. In particular, the type constructor C (FREE F) 
corresponds to CTREE, and FreeLike F takes the role of TREELIKE. There is no need 
to provide any further definitions, since all the type constructor class instances 
that are needed are automatically obtained from the mentioned single one, and 
our generic definition of improve is similarly covering the earlier more specific 
one. In the next sections we will benefit from this genericity repeatedly. 

4 A More Realistic Example 

Swierstra and Altcnkirch [11] build a pure model of Haskell's teletype 10, with 
the aim of enabling equational reasoning and automated testing. The monad 
they use for this corresponds to FREE F_IO for the following functor: 

data F_IO (3 = GetChar (Char -► f3) \ PutChar Char (3 

instance Functor FJO where 

fmap h (GetChar f) = GetChar (h o f) 
fmap h (PutChar c x) = PutChar c (h x) 



They then provide replacements of Haskell's getChar/putChar functions that pro- 
duce pure values of this modelling type rather than doing actual 10. We can do 
so as well, catching up to List. 1 of [11]. 

getChar :: FreeLike F_IO (j, => fi Char 
getChar = wrap (GetChar return) 

putChar :: FreeLike F_IO fx => Char — ► /z () 
putChar c = wrap (PutChar c (return ())) 

The only differences of note are the more general return types of our versions 
of getChar and putChar. Just as the original function versions, our versions can 
be used to specify any interaction. For example, we can express the following 
computation: 

revEcho :: FreeLike F_IO /i =>• fj, Q 
revEcho = 
do 

c «— getChar 
when (c^ ' ') $ 
do 

revEcho 
putChar c 

Run against the standard Haskell definitions of getChar and putChar (and obvi- 
ously, then, with the different type signature revEcho :: 10 ()), the above code 
reads characters from the input until a space is encountered, after which the 
sequence just read is written to the output in reverse order. 

The point of Swierstra and Altenkirch's approach is to run the very same 
code against the pure model instead. Computing its (or similar functions') be- 
haviour is done by a semantics they provide in their List. 2 and which is virtually 
replicated here (the only differences being two occurrences of Wrap): 

data Output a = Read (Output a) | Print Char (Output a) | Finish a 

data Stream a = Cons {hd :: a, tl :: Stream a} 

run :: Free F_IO a — > Stream Char — » Output a 
run (Return a) cs = Finish a 

run (Wrap (GetChar f)) cs = Read (run (f (hd cs)) (tl cs)) 
run (Wrap (PutChar c p j) cs = Print c (run p cs) 

Simulating a run of revEcho on some input stream, or indeed using QuickCheck [12] 
to analyse many such runs, takes the following form: 

run revEcho stream. 

It turns out that this requires runtime quadratic in the number of characters in 
stream before the first occurrence of a space. This holds both with our definitions 
and with those of [11]. So these two sets of definitions are not only equivalent 



with respect to the pure models and associated semantics they provide, but also 
in terms of efficiency. The neat twist in our setting, however, is that we can 
simply write 

run (improve revEcho) stream (14) 

to reduce the complexity from quadratic to linear. The manner in which the 
quadraticity vanishes here is actually very similar to that observed for the "zigzag 
after fullTree" example at the end of Sect. 2, so we refrain from giving the code 
to which (14) is eventually compiled. 

It is worth pointing out that the nicely general type of revEcho that makes 
all this possible could be automatically inferred from the function body if it were 
not for Haskell's dreaded monomorphism restriction. In fact, in GHC 6.6 we have 
the option of suppressing that restriction, in which case we need not provide the 
signature revEcho :: FreeLike F_IO \i =>• fj, (), and thus need not even be aware 
of whether we program against the pure teletype 10 model in the incarnation 
of [11], our "magically improvable" variant of it, or indeed the standard Haskell 
10 monad. 

5 Related Work 

In this section we relate our work to two other strands of recent work that use 
two-level types in connection with monadic datatypes. 

5.1 Structuring Haskell IO by Combining Free Monads 

We have already mentioned the work by Swierstra and Altcnkirch [11] on build- 
ing pure models of (parts of) the Haskell 10 monad. Apart from teletype 10, 
they also consider mutable state and concurrency. In both cases, the modelling 
type is a free monad and thus amenable to our improvement method. In a recent 
pearl [13, Sect. 7], Swierstra takes the modelling approach a step further. The 
free monad structure is used to combine models for different aspects of Haskell 
10, and the models are not just used for reasoning and testing in a pure setting, 
but also for actual effectful execution. The idea is that the types derived for 
terms over the pure models are informative about just which kinds of effects 
can occur during eventual execution. Clearly, there is an interpretative overhead 
here, and somewhat startlingly this even affects the asymptotic complexity of 
programs. 

For example, for teletype 10 the required execution function looks as follows, 
referring to the original, effectful versions of getChar and putChar: 

exec :: Free F_IO q^IOo 
exec (Return a) = return a 

exec (Wrap (GetChar f)) = Prelude. getChar »= (exec o f) 
exec (Wrap (PutChar c p)) = Prelude. putChar c » exec p 

Now, main = exec revEcho unfortunately has quadratic runtime behaviour, very 
evident already via simple experiments with piping to the compiled version a 



text file with a few thousand initial non-spaces. This is in stark contrast to run- 
ning revEcho (with alternative type signature revEcho :: 10 ()) directly against 
the 10 monad. Quite nicely, simply using main = exec (improve revEcho) recovers 
the linear behaviour as well. So thanks to our improvement method for free mon- 
ads, which is orthogonal to Swierstra's "combination by coproducts" approach, 
wc can have it both: pure modelling with informative types and efficient execu- 
tion without (too big) interpretative overhead. Of course, our improvement also 
works for other cases of Swierstra's approach, such as his calculator example in 
Sect. 6. Up to compatibility with Agda's dependent type system, it should also 
apply to the models Swierstra and Altenkirch provide in [14] for computation 
on (distributed) arrays, and should reap the same benefits there. 

5.2 Short Cut Fusion for Monadic Computations 

Ghani et al. [15, 1G] observe that the augment combinator known from work on 
short cut fusion [17,18] has a monadic interpretation, and thus enables fusion for 
programs on certain monadic datatypes. This strand of work is thus the one most 
closely related to ours, since it also aims to improve the efficiency of monadic 
computations. An immediate difference is that Ghani et al.'s transformation can 
at best achieve a linear speedup, but no improvement of asymptotic complexity. 
More specifically, their approach does not allow for elimination of data struc- 
tures threaded through repeated layers of monadic binding inside a recursive 
computation. Since the latter assertion seems somewhat in contradiction to the 
authors' description, let us elaborate on what we mean here. 

First of all, the cases of successful fusion presented in [15,16] as examples all 
have the very specific form of a single consumer encountering a single producer, 
that is, eliminating exactly one layer of intermediate data structure. The authors 
suggest that sequences of »= arising from do-notation lead to a rippling effect 
that enables several layers to be eliminated in a row, but we could not reproduce 
this. In particular, Ghani and Johann [16, Sect. 5] suggest that this happens for 
the following kind of monadic evaluator: 

data Expr = Add Expr Expr | ... 

eval (Add e l e 2 ) = 
do 

x <— eval e 1 
y <— eval e 2 
return (x+y) 

But actually, the above right-hand side desugars to 

eval e l »= (Ax — > eval e 2 »= (Ay — > return (x+y))) (15) 

rather than to an expression of the supposed form (m »= /c x ) »= k 2 - In fact, 
not a single invocation of the monadic short cut fusion rule is possible inside (15). 



In contrast, our improvement approach is quite effective for eval. If, for example, 
we complete the above to 

data Expr = ... ] Div Expr Expr | Lit Int 

eval (Div e 1 e 2 ) = 
do 

y «— eval e 2 

if y = 0 then fail "division by zero" else 
do 

x <— eval e 1 
return (div x y ) 
eval (Lit /') = return /' 

and run it against the exception monad defined as follows: 

data F_Exc (3 = Fail String 

instance Functor F_Exc where 
fmap h (Fail s) = Fail s 

fail s = wrap (Fail s) 

then we find that while improve does not necessarily always give asymptotic 
improvements, it still reduces absolute runtimes here. Moreover, it turns out to 
have a beneficial impact on memory requirements. In particular, for expressions 
with deeply nested computations, such as 

deep n = foldl Add (Div (Lit 1) (Lit 0)) (map Lit [2..n]) 

we find that improve (eval (deep n)) :: Free F_Exc Int works fine for n that are 
orders of magnitude bigger than ones for which eval (deep n) :: FREE F_Exc INT 
already leads to a stack overflow. An intuitive explanation here is that improve 
essentially transforms the computation into continuation-passing style. 

Clearly, just as for eval above, the monadic short cut fusion method proposed 
by Ghani et al. [15,16] does not help with any of the earlier examples in this 
paper. Maybe it is possible to bring it to bear on such examples by inventing 
a suitable worker/ wrapper scheme in the spirit of that applied by Gill [17] and 
Chitil [19] to achieve asymptotic improvements via short cut fusion. If that can 
be achieved at all for monadic short cut fusion, which is somewhat doubtful 
due to complications involving polymorphic recursion and higher-kinded types, 
it would definitely require extensive restructuring of the code to be improved, 
much in contrast to our near-transparent approach. 

On the other hand, Ghani et al.'s work is ahead of ours in terms of the monads 
it can handle. Their fusion rule is presented for a notion of inductive monads 
that covers free monads as a special case. More specifically, free monads are 
inductive monads that arise as fixpoints of a bifunctor that, when applied to one 
argument, gives the functor sum of the constant- valued functor returning that 
fixed argument and some other arbitrary, but fixed, functor. In other words, 



our FREE 0 corresponds to Mu (SumFunc </>) in the terminology of Ghani and 
Johann [16, Ex. 14]. Most fusion success stories they report are actually for this 
special kind of inductive monad and, as we have seen, all models of Swierstra 
and Altenkirch live in the free monad subspace as well. But still, it would be 
interesting to investigate a generalisation of our approach to inductive monads 
other than the free ones, in particular to ones based on functor product instead 
of functor sum above. 



6 Conclusion 

We have developed a program transformation that, in essence, makes monadic 
substitution a constant-time operation and can have further benefits regard- 
ing stack consumption. Using the abstraction mechanisms provided by Haskell's 
type system, we were able to formulate it in such a way that it does not interfere 
with normal program construction. In particular, programmers need not a priori 
decide to use the improved representation of free monads. Instead, they can pro- 
gram against the ordinary representation with the only (and non-encumbering) 
proviso that it be captured as one instance of an appropriate type constructor 
class. This gives code that is identically structured and equally efficient to the 
one they would write as usual. When utilisation of the improved representation 
is desired (for example, because a quadratic overhead is observed), dropping it 
in a posteriori is as simple as adding a single call to improve at the appropriate 
place. This transparent switching between the equivalent representations also 
means that any equational reasoning about the potentially to be improved code 
can be based on the ordinary representation, which is, of course, beneficial for 
applications like the ones of Swierstra and Altenkirch, discussed in Sect. 5.1. 
(Or, formulated in terms of the example from Sect. 2: we can reason and think 
about fullTree, as special case of fullTree', even though actually fullTree" will be 
run eventually. 1 ) 

The genericity that comes via two-level types is a boon for developing and 
reasoning about our method, but not an indispensable ingredient. It is always 
possible to obtain type constructors, classes, and improve-functions tailored to 
a particular datatype (as in Sect. 2). This is done by bundling and unbundling 
type isomorphisms as demonstrated by Ghani and Johann [16, App. A]. 
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1 An example of a property that is much simpler to prove for fullTree than for fullTree" 
is the fact that the output trees produced for input n will only ever contain integers 
from the interval 0 to n. While this has a straightforward proof by induction for 
fullTree n, proving it for fullTree" n LEAF requires a nontrivial generalisation effort to 
find a good (i.e., general enough) induction hypothesis. 
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